Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
نویسندگان
چکیده
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers.
منابع مشابه
XpressSpace: a programming framework for coupling partitioned global address space simulation codes
Complex coupled multiphysics simulations are playing increasingly important roles in scientific and engineering applications such as fusion, combustion, and climate modeling. At the same time, extreme scales, increased levels of concurrency, and the advent of multicores are making programming of high-end parallel computing systems on which these simulations run challenging. Although partitioned...
متن کاملGSHMEM: A Portable Library for Lightweight, Shared-Memory, Parallel Programming
As parallel computer systems evolve to address the insatiable need for higher performance in applications from a broad range of science domains, and exhibit ever deeper and broader levels of parallelism, the challenge of programming productivity comes to the forefront. Whereas these systems (and, in some cases, devices) are often constructed as distributed-memory architectures to facilitate eas...
متن کاملIs OpenMP for Grids?
This paper presents an overview of an ongoing NSFsponsored project for the study of runtime systems and compilers to support the development of efficient OpenMP parallel programs for distributed memory systems. The first part of the paper discusses a prototype compiler, now under development, that will accept OpenMP and will target TreadMarks, a Software Distributed Shared Memory System (SDSM),...
متن کاملPGAS Models using an MPI Runtime: Design Alternatives and Performance Evaluation
Programming models play a critical role in designing scalable applications. In the past few decades, MPI [3] has become the de facto programming model for writing parallel applications. At the same time, alternative programming models such as Partitioned Global Address Space (PGAS) programming models are gaining traction due to the asynchrony, ability to read/write distributed data structures a...
متن کاملA Performance Model for Unified Parallel C
This research is a performance centric investigation of the Unified Parallel C (UPC), a parallel programming language that belong to the Partitioned Global Address Space (PGAS) language family. The objective is to develop performance modeling methodology that targets UPC but can be generalized for other PGAS languages. The performance modeling methodology relies on platform characterization and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015